cloud failure
The Microsoft Azure Outage Shows the Harsh Reality of Cloud Failures
The second major cloud outage in less than two weeks, Azure's downtime highlights the "brittleness" of a digital ecosystem that depends on a few companies never making mistakes. Microsoft's Azure cloud platform, its widely used 365 services, Xbox, and Minecraft started suffering outages at roughly noon Eastern time on Wednesday, the result of what Microsoft said was "an inadvertent configuration change." The incident--which marks the second major cloud provider outage in less than two weeks--highlights the instability of an internet built largely on infrastructure run by a few tech giants. Microsoft's problems specifically originated from Azure's Front Door content delivery network and emerged just hours before Microsoft's scheduled earnings announcement. The company website, including its investor relations page, was still down on Wednesday afternoon, and the Azure status page where Microsoft provides updates was having intermittent issues as well.
- North America > United States > New York (0.06)
- North America > United States > New Mexico (0.05)
- North America > United States > California (0.05)
- (4 more...)
- Information Technology > Services (1.00)
- Information Technology > Security & Privacy (0.97)
- Leisure & Entertainment > Games > Computer Games (0.55)
Diffusion-based Time Series Data Imputation for Microsoft 365
Yang, Fangkai, Yin, Wenjie, Wang, Lu, Li, Tianci, Zhao, Pu, Liu, Bo, Wang, Paul, Qiao, Bo, Liu, Yudong, Björkman, Mårten, Rajmohan, Saravan, Lin, Qingwei, Zhang, Dongmei
Reliability is extremely important for large-scale cloud systems like Microsoft 365. Cloud failures such as disk failure, node failure, etc. threaten service reliability, resulting in online service interruptions and economic loss. Existing works focus on predicting cloud failures and proactively taking action before failures happen. However, they suffer from poor data quality like data missing in model training and prediction, which limits the performance. In this paper, we focus on enhancing data quality through data imputation by the proposed Diffusion+, a sample-efficient diffusion model, to impute the missing data efficiently based on the observed data. Our experiments and application practice show that our model contributes to improving the performance of the downstream failure prediction task.
- Information Technology > Data Science > Data Quality (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)